Skip to content

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853

Merged
duburcqa merged 10 commits into
Genesis-Embodied-AI:mainfrom
hughperkins:hp/use-bitonic-sort-kv
Jun 5, 2026
Merged

[MISC] Switch dedupe contact sort to use Quadrants bitonic sort#2853
duburcqa merged 10 commits into
Genesis-Embodied-AI:mainfrom
hughperkins:hp/use-bitonic-sort-kv

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Replaces the inlined 15-stage bitonic compare-exchange schedule in func_clamp_prune_and_sort_contacts_coop (phase 1a) with a one-line call to the new Quadrants subgroup primitive:

my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5)

The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that inlines at compile time and unrolls the same 15 compare-exchange stages this code used to write inline, so the generated kernel IR is bit-identical to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed, the sentinel load + write-back wrapper is unchanged, and the rest of the kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched.

log2_size = 5 pins the sort to a 32-lane tile, matching the kernel's hard-coded block_dim = _K = 32. Using the tiled form rather than the bare bitonic_sort_kv(...) wrapper keeps the sort width fixed at 32 even on AMDGPU wave64, where the bare wrapper would otherwise sort across all 64 lanes and mix in garbage from the inactive upper half.

Requires the matching quadrants change to be installed (the public symbol qd.simt.subgroup.bitonic_sort_kv_tiled is added by that PR).

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

…rt_kv_tiled

Replaces the inlined 15-stage bitonic compare-exchange schedule in
``func_clamp_prune_and_sort_contacts_coop`` (phase 1a) with a one-line
call to the new Quadrants subgroup primitive:

    my_key, my_idx = qd.simt.subgroup.bitonic_sort_kv_tiled(my_key, my_idx, 5)

The primitive (added in quadrants hp/bitonic-sort-kv) is a @qd.func that
inlines at compile time and unrolls the same 15 compare-exchange stages
this code used to write inline, so the generated kernel IR is bit-identical
to today on CUDA. Net change: ~30 lines of hand-rolled bitonic code removed,
the sentinel load + write-back wrapper is unchanged, and the rest of the
kernel (clamp + key init + bucket walk + phase 2 + phase 3) is untouched.

``log2_size = 5`` pins the sort to a 32-lane tile, matching the kernel's
hard-coded ``block_dim = _K = 32``. Using the tiled form rather than the
bare ``bitonic_sort_kv(...)`` wrapper keeps the sort width fixed at 32 even
on AMDGPU wave64, where the bare wrapper would otherwise sort across all
64 lanes and mix in garbage from the inactive upper half.

Requires the matching quadrants change to be installed (the public symbol
``qd.simt.subgroup.bitonic_sort_kv_tiled`` is added by that PR).
7 lines of prose -> 3.  Same intent: explain why we use the tiled form with
``log2_size = 5`` rather than the bare ``bitonic_sort_kv`` wrapper.
Pair ``_K = qd.static(32)`` with ``_LOG2_K = qd.static(5)`` at the top
of the kernel and pass ``_LOG2_K`` into ``bitonic_sort_kv_tiled``.  The
relationship between the sort width and ``_K`` is now visible at the
binding site instead of being a magic 5 sitting next to ``_K = 32``.
With ``_K`` and ``_LOG2_K`` defined together at the top of the kernel
and ``_LOG2_K`` flowing straight into ``bitonic_sort_kv_tiled``, the
explainer ("pins the tiled sort to _K lanes on every backend ...")
just restates what the names already convey.
``qd.static(32)`` is just the int ``32`` at compile time, so
``_K.bit_length() - 1`` evaluates to ``5`` and keeps _K and _LOG2_K
in sync if _K is ever retuned.
``qd.static()`` is a no-op on Python int literals -- it evaluates its
argument at compile time, and a plain ``32`` is already a Python
compile-time int.  Several other Genesis solver files wrap kernel-scope
``BLOCK_DIM`` / ``WARP_SIZE`` / ``_K`` constants this way as a defensive
marker, but it doesn't change codegen and the bare ints read more
directly.
Comment thread genesis/engine/solvers/rigid/collider/contact.py
duburcqa
duburcqa previously approved these changes Jun 1, 2026
hugh and others added 2 commits June 4, 2026 14:30
… time

Reverts the qd.static() removal that broke kernel compilation. Without
qd.static, _K = 32 becomes a kernel-local Expr rather than a Python int, so
_K.bit_length() is routed to quadrants.lang.matrix_ops by the AST transformer
and fails with AttributeError. Wrapping in qd.static keeps _K a compile-time
Python int, letting int.bit_length() evaluate to a folded literal.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

⚠️ Abnormal Benchmark Result Detected ➡️ Report

@hughperkins hughperkins marked this pull request as ready for review June 4, 2026 22:16
@hughperkins hughperkins requested a review from YilingQiao as a code owner June 4, 2026 22:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78a6b9fed0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread genesis/engine/solvers/rigid/collider/contact.py
@duburcqa duburcqa merged commit 2f219c4 into Genesis-Embodied-AI:main Jun 5, 2026
21 of 23 checks passed
@hughperkins hughperkins deleted the hp/use-bitonic-sort-kv branch June 5, 2026 14:07
@hughperkins
Copy link
Copy Markdown
Collaborator Author

🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants